Learning Distance Functions with Product Space Boosting
نویسندگان
چکیده
A good distance function is an essential tool in applications which involve querying large databases, such as image retrieval and bioinformatics. We describe a non-parametric algorithm for distance function learning which is based on the boosting of low grade weak learners in a product space. The algorithm learns a function defined over pairs of points, using supervision in the form of equivalence constraints. The weak learners are based on partitioning the original feature space, using a generic density estimation generative model (GMM) augmented by equivalence constraints on pairs of datapoints. Using a number of databases from the UCI repository, we show significantly improved results over methods which learn the parametric Mahalanobis distance. We also show initial results of image retrieval, using a large database of facial images (YaleB).
منابع مشابه
Multiclass Semi-supervised Boosting Using Different Distance Metrics
The goal of this thesis project is to build an effective multiclass classifier which can be trained with a small amount of labeled data and a large pool of unlabeled data by applying semi-supervised learning in a boosting framework. Boosting refers to a general method of producing a very accurate classifier by combining rough and moderately inaccurate classifiers. It has attracted a significant...
متن کاملFeature Selection in Distance Learning from Small Sample
Learning from a small sample is an acute problem which arises in many applications where acquiring new samples is difficult, time consuming or expensive. The problem becomes even harder when dealing with rich high dimensional data. The learning process in such cases is often preceded by dimensionality reduction or feature selection. The need to avoid overfitting of an algorithm to the data is c...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملGeneralized Dictionary for Multitask Learning with Boosting
While multitask learning has been extensively studied, most existing methods rely on linear models (e.g. linear regression, logistic regression), which may fail in dealing with more general (nonlinear) problems. In this paper, we present a new approach that combines dictionary learning with gradient boosting to achieve multitask learning with general (nonlinear) basis functions. Specifically, f...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کامل